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Abstract —The web plays an important role in people’s social 
lives since the emergence of Web 2.0. It facilitates the interaction 
between users, gives them the possibility to freely interact, share 
and collaborate through social networks, online communities forums, 
blogs, wikis and other online collaborative media. However, an other 
side of the web is negatively taken such as posting inflammatory 
messages. Thus, when dealing with the online communities forums, 
the managers seek to always enhance the performance of such 
platforms. In fact, to keep the serenity and prohibit the disturbance of 
the normal atmosphere, managers always try to novice users against 
these malicious persons by posting such message (DO NOT FEED 
TROLLS). But, this kind of warning is not enough to reduce this 
phenomenon. In this context we propose a new approach for detecting 
malicious people also called ’Trolls’ in order to allow community 
managers to take their ability to post online. To be more realistic, 
our proposal is defined within an uncertain framework. Based on 
the assumption consisting on the trolls’ integration in the successful 
discussion threads, we try to detect the presence of such malicious 
users. Indeed, this method is based on a conflict measure of the belief 
function theory applied between the different messages of the thread. 
In order to show the feasibility and the result of our approach, we 
test it in different simulated data. 

Keywords —Q&AC, trolls, belief function theory, conflict measure. 


I. Introduction 

T he way we look for, and acquire information has shifted 
greatly into to instant, easy and low cost process. In fact, 
thanks to the Internet one can make a research in any given 
topic, get a huge amount of information by a simple click. 
Although, for some problems it is difficult to get satisfactory 
answers by searching directly on a traditional search engine. 
Instead, we prefer to find someone who has expertise or 
experience. In order to have the best answer, one of the 
tools that has widened the scope of information exchange is 
Question Answering Communities (Q&AC). These systems 
allow everyone to contribute as much as they can on a given 
community. Unfortunately, not all messages can be considered 
as reliable: some users claim themselves as experts, and 
other people post messages without any utility for the one 
who is seeking for answers. Thus, the managers of these 
communities seek to always enhance the performance of such 
platforms. Although, the increase of the useless messages can 
be attributed to the presence of trolls. The term of trolling has 
been defined in several works within different communities, 
including [2], [6] and [17]. These Malicious people intend to 
insidiously mislead the subject of the discussion in order to 
provoke controversy and disrupt the discussion. They aim to 
make normal users fall into their traps by deviating them from 
the main topic of the discussion. In fact, the only way to deal 
with a troll is to ignore him, or detect his presence in order 
to notify him or take away his ability to post online. Thus, 
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some other works tried to detect not just their characteristics 
but also their presence in order to avoid them. To address 
this problem Cambria et al. [14] proposed a technique based 
on semantic computing to automatically detect and check 
web trolls. This work aims to prevent the malicious people 
from emotionally hurting other users or communities within 
the same social network. In another work Ortega et al. [15] 
proposed a method to classify users in a social network 
regarding to their trustworthiness. The goal of their method 
is to detect trolls from the other users by preventing such 
malicious users to gain high reputation in the network. Patxi 
et al. [16] dealt with Trolling users on twitter social network. 
These studies were explored in different social networks within 
certain framework. 

When dealing with real-world applications, the massive 
amounts of data are inseparably connected with imperfection. 
In fact, this kind of data can be imprecise and/or uncertain 
or even missing. Different theories have emerged to deal with 
this kind of data such as fuzzy set theory [21], possibility 
theory [22] and belief function theory [1]. Thus, to be closer 
to reality and to obtain more relevant results, we propose a 
new method dealing with uncertain data. This method aims to 
detect trolls in Q&AC using the framework of belief function. 

The paper proceeds as follows: in Section 2, we introduce 
the Q&AC and briefly review related works. In section 3, 
we present the necessary background regarding the different 
concepts of the belief functions theory. We define the different 
steps of our proposal based on a conflict measure in section 
4. Finally, we present the feasibility of the proposed method 
on an illustrated example. 

II. Q&AC: Quick overview 

In this section we introduce some concepts related the 
Q&AC. First we will start by presenting the main actors in 
these forums, then a little overview on sources identification 
and finally the levels of uncertainty we can face in Q&AC. 

A. Users within Q&AC 

Users are considered as the main actors within Question 
Answering Communities. We can define different types such 
as experts, trolls and learners. 

• Reliable user / Expert: a person who is very knowledge¬ 
able about or skilful in a particular area. 

• Troll: a person who seeks to disturb the serenity of the 
concerned community. His purpose is to create controver¬ 
sial debates by multiplying irrelevant messages that we 
keep unanswered. 

• Learner: a normal user of the Question Answering 
Community, trying to gain information and expertise. 



B. Sources Identification within QScAC 

Several researches have been exploring this field, trying to 
evaluate sources of information in Q&AC. Such as Bouguessa 
et al. [7] who proposed a model to identify authoritative 
users based on the number of best answers provided by them. 
A best answer is selected either by the asker or by other 
users via a voting procedure. In [12], the author focused 
on the selection of questions a user would choose for an¬ 
swering. Based on these studies, experts prefer answering 
questions where they have a higher chance of making a 
valuable contribution. Recently in [13], the authors proposed a 
framework for evaluating both the reliability and the expertise 
of an information provider. Considering some cognitive and 
behavioral criteria of the users, they were able to establish 
a trust system. Using a response matrix summarizing the 
interactions between peers of persons, each one is capable 
of estimating and providing an opinion. Using the subjective 
logic to aggregate these evaluations, they provided later a 
global reliability and expertise value for each user within 
Q&AC. 


C. Uncertainty within Q&AC 

When dealing with information provided by humans, we 
are facing several levels of uncertainty. Gjergji et al. proposed 
three levels for Q&AC [11], the first one is related to the 
extraction and integration of uncertainty, the second deals 
with information sources uncertainty and finally the inherent 
knowledge related to the information itself. In our case, we 
are more interested in the evaluation of the sources and the 
part of uncertainty related to them. The main issue in these 
communities is that we are facing users that we do not always 
have an apriori knowledge about them. We ignore every thing 
about the sources’ credibility, reliability, relevance, objectivity 
and expertise. In this context, we will exploit all the mathe¬ 
matical background and large panel of sepcificities provided 
by the theory of belief functions to help us considering this 
problem in an uncertain point of view. 


III. Theory of Belief Functions 

This section recalls the necessary background related to the 
belief function theory It has been developed by Dempster in 
his work on upper and lower probabilities [1]. Based on that, 
he was able to represent more precisely the observed data. 

A belief function must take into consideration all the 
possible events on which a source can describe a belief. Based 
on that, we can define the frame of discernment. 


A. Frame of discernment 

It is a finite set of disjoint elements noted Cl where Cl = 
{uji ,..., w„}. This theory allows us to affect a mass on a set of 
hypotheses not only a singleton like in the probabilistic theory. 
Thus, we are able to represent ignorance, imprecision... 


B. Basic belief assignment (bba) 

A bba is defined on the set of all subsets of Cl, named power 
set and noted 2^. It affects a real value from [0,1] to every 
subset of 2 ^ reflecting sources amount of belief on this subset. 
A bba m verifies: 


^ mix) = 1 . ( 1 ) 

xcn 

We consider any positive elementary mass m(X) > 0 as a 
focal element such that X belongs to 2^. 


C. Combination rules 

Many combination rules have been proposed taking in 
consideration the nature of the sources. 

1) Dempster’s combination rule: The first one was pro¬ 
posed by Dempster in 1967 [1] which is a conjunctive normal¬ 
ized combination rule also called the orthogonal sum. Given 
two mass functions mi and m 2 , for all X € 2 ^, X 7 ^ 0 , the 
Dempster’s rule is defined by: 

m_D(X) = mi nm2(X) = ^ TOi(Ti)m2(T'2) (2) 

where k = X]FinY 2=0 is the inconsistency 

of the fusion (or of the combination) can also be called the 
conflict or global conflict. (1 — k) is the normalization factor 
of the combination in a closed world. 

2) The conjunctive combination rule: In order to consider 
the issues of the open world, the conjunctive combination rule 
was introduced by Smets [9]. Considering two mass functions 
mi and m 2 , for all X € 2^ roconj is defined by: 


rriconjiX) = ^ TOi(yi)m 2 (U 2 ) (3) 

YinY2=X 

3) The disjunctive combination rule: First introduced by 
Dubois and Prade 1986 [18], the induced results of two bbas 
mi and m 2 is defined as follows: 

yx CClmd^sJiX)= rniiYi)m2iY2) (4) 

YiVY2=X 

The disjunctive combination rule can be used when one of 
the sources is reliable or when we have no knowledge about 
their reliability. 


IV. Inclusion as a conflict measure for belief 

FUNCTIONS 


Recently Martin in [3] used a degree of inclusion as 
involved in the measurement made in order to determine the 
conflict during the combination of two belief functions. He 
presented an index of inclusion having binary values where: 


InciX„Y2) = 


l,ifXi C ^2 
0, otherwise 


(5) 


With Xi, Y 2 being respectively the focal elements of mi 
and m 2 . This index is then used to measure the degree of 
inclusion of the two mass functions and defined as: 



dine = IJ^ MI 51 51 Inc{Xi,Y2) (6) 

I ^'xiGFiFaGFa 

Where |Fi| and IF 2 I are the number of focal elements of mi 
and m 2 . He define the degree of inclusion of mi and m 2 : 
< 7 mc(wi,W 2 ) as follows: 

^inc {mi,m 2 ) = max{di„c{mi,m2) 1 dine (m 2 , mi)) (7) 

Where dine is the degree of inclusion of mi in m 2 and 
inversely. This inclusion is used as a conflict measure for two 
mass functions, using it like presented: 


C'on/(mi, m 2 ) = (1 - <T^nc{mi,m 2 )d{mi,m 2 )) ( 8 ) 

where d(mi,m 2 ), is the distance of Jousselme [10]: 

d{mi , m 2 ) = (mi - m 2 )^|Z(mi - m 2 ) (9) 

where D is a metric based on the measure of Jaccard: 


( l,ifA = B = <i) 


( 10 ) 


V. Trolls Identification based in a conflict 

MEASURE 

Based on the assumption that consists of the trolls’ integra¬ 
tion in the successful discussion threads, we propose a new 
method for detecting malicious people in online communities 
forums. This approach is defined within the framework of 
belief functions. Indeed, it is based on a conflict measure 
of this theory applied between the different messages of the 
thread. We can summarize our proposed method in three major 
steps that will be discussed in depth in the following. 


A. Users ’ messages 

Hardarker proposed primary characteristics of a troll [2] 
(Aggression, Deception, Disruption, Success). In 2014, Buck- 
els et al. [6] specified the behavioral characteristics of a troll. 
They described them as persons having sadism, psychopathy 
and machiavilism. To them, trolling is a ’’deceptive, destructive 
or disruptive manner in social media”. 

In the context of this work, to distinguish between the troll 
and the other users, we tried to manually extracted the charac¬ 
teristics of their responses from the answers and comments in 
different forums. Based on these characteristics, the content of 
the messages can be: Off-topic, senseless or controversy. Using 
these characteristics, we have defined the frame of discernment 
that can characterize a message in a forum: 

Hmsg = {Off — topic, Senseless, 1,..., N} (11) 

• Senseless: how much the response is empty of meaning? 

• Off-topic: How irrelative the answer can be? 

• [l..iV] : number of topics where, [l..iV]\i with i being 
the relevant topic, and [l..A^]\z are the controversy topics 
posted by a troll. 

During this step, we assume that a method of analysis 
expresses a piece of evidence concerning the nature of each 
message. This method aims to analyze the messages relative 
to the posted question or topic. 


B. Users ’ conflict 

Detecting irrelevant messages does not only means that this 
user is a troll. Thus, it is not only the content of the messages 
that can characterize the trolls. We can find a victim user that 
responds to a message posted by a troll. Besides, the subject 
of the discussion can change gradually. In fact, to distinguish 
between trolls and other users in a community, we need to 
quantify how a given user is in conflict with the rest of all the 
other users. Thus, we will base our approach on measuring the 
conflict between the messages of each person posting answers. 
The list of notations is shown in table |I] Using the inclusion 

TABLE I 

List of notations 


Notations 

Description 

U 

Users 

N 

Number of users 

NP 

Number of all the previous messages 

NPj 

Number of the previous messages of a user Uj 

Ni 

Number of all messages posted by a user Ui 

Ni 

Number of all messages posted by a user Uj 

mk 

message of a user Ui 

ms 

message of a user Uj 

Rank(m) 

Rank of the message m 

Tabl 

Contains in each time the conflict of a message 
relative to each user 

Tab2 

Contains in each time the number of the previous 
messages of a message 

Tabs 

Contains the total conflict of each user 

Conft 

Contains the sum of conflict of each user 


as a conflict measure for belief functions, for each user Ui we 
will measure: 

• Confmsg/A measures the conflict between the mes¬ 
sage posted by Ui and the messages that were posted 
before it by each other users Uj. 

Confmsg/u{mk{Ui),m{Uj)) = 

NPj 

Conf{mk{Ui), ms{Uj)), {i ^ j) (12) 

• Confmsg- measures the conflict between the message 
posted by Ui and the all messages that were posted before 
it by all the other users U based on a weighted mean. 
This measure takes into account the number of messages 
posted by every user in order to determine the level of 
conflict especially between a troll and an expert. 

Confmsg{mk{Ui),m{U)) = 

Afp. 

Y^f^^°^f'^^9/u^^k{Ui),m{U,)) (13) 

• Confuser- measures the global conflict of the user Ui 

Confy^gQj^ pj E Confjnsg{mk{Ui),m{U)) (14) 

® k=l 

The value of the total conflict of a user can be risen when 
this user launches into an interminable debate with a troll. In 
this case, this victim user becomes in his turn a troll. Thus, the 
managers have to control the behavior of the users in many 
discussion threads. 






















C. Users’ clustering 

The last step consists on the classification of the users 
according to their conflict results into two groups. Therefore, 
to make decision we base our approach on an unsupervised 
classification method using the k means algorithm. 

It was introduced by McQueen [19] and implemented in 
its current forms by Forgy [20]. The Kmeans algorithm aims 
to construct from the objects of the training set K partitions 
(clusters) concentrated and isolated from each other. In our 
case, we will devise the users into two partitions: K= 2. Since 
the value of the troll’s conflict is bigger than the conflict of 
the other users: 

- Trolls belong to the group having the biggest value of 
center. 

- The other users belong to the group having the least value 
of center. 


VI. Experimentation 

To illustrate the comportment of our proposed method, we 
have tested it in different simulated data. In this section, we 
will present two different examples. 


A. Example 1 

As we presented our method, it has three main steps. Indeed, 
we will present the results of each step: 

1) Users’ messages: Our assumption consists on the in¬ 
tegration of the trolls in the successful discussion threads. 
From this point of view, we simulate the data of analyze of 
messages as depicted in Figure [T] In fact, in this example 
we will try to detect a troll in a group of 4 users. In 
this scenario, the discussion thread contains 16 messages 
posted by different users and among whom three messages 
are published by a troll. In this example, our frame of 
discernment is composed by 4 elements: Relevant=A'i, off- 
topic=X 2 , senseless= 2 f 3 , controversy-topic=Ar 4 . As shown in 
Figure 2 each row presents: the owner of the message, the 
order of the message in the discussion thread and the mass 


function of this message (as mention in section III B each 
bba must be equal to 1 ). 

In this example, the first message of the troll {U 4 ) is con¬ 
troversy: 771 ( 264 ) = 0.9210. His second message is empty of 
meaning: 771 ( 263 ) = 0.9716. His third message is controversy: 
771 ( 264 ) = 0.8387. 

2) Users conflict: Based on the method of the inclusion and 
applying our algorithm, we will present the total conflict of 
each user of our example in Table [n| U 4 has the biggest value 
of conflict. The total conflicts of users Ui and U 2 is small 
relative to the total conflict of U 4 despite the fact that they 
responded to the first message of the troll by posting each 
one a controversy message, this result can be explained by 
the answers provided by these two users who have published 
relevant messages. U 3 has a small value of conflict, he 
published three relevant messages where in his first message 
m(26i) = 0.9732, in his second message 777 ( 261 ) = 0.7782, 
and his third message 777(261) = 0.9632. 
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Fig. I. Simulation Results 


TABLE II 

Total conflict of each user 



Ui 

U 2 

U 3 

U 4 

Oontuser 

0.0610 

0.0639 

0.0489 

0.2030 


3) Users clustering: Applying the i 6 -means algorithm to 
the different values of total conflict of all users we obtained 
two clusters. 

- Trolls= {U4} 

- Other users = {(7i, ( 72 , 6 ^ 3 } 

Our proposal provides us a correct classification of the users. 
This result shows the feasibility of our proposed method. 


B. Example 2 

For this simulation we will assume that we are dealing 
with 8 users, among them two trolls. The discussion thread 
contains 31 messages. The result of the total conflict of each 
user expressed in equation 14 is illustrated in figure 

The first troll U 4 published 2 controversy messages and the 
second troll (Jg published 3 messages: The two first ones are 
off-topic, and the last one is controversy. 
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- Ui posted 3 relevant messages and 2 controversy messages 
to respond to the first troll. 

- U2 posted 7 relevant messages and 2 controversy messages 
to respond to the hrst troll. 

- C /3 posted 4 relevant messages and one off-topic message to 
respond to the second troll. 

- C /5 posted one relevant message. 

- Uq published 3 relevant messages. 

- C /7 published 2 relevant messages. 

The total conflict of the troll C /4 is bigger relatively to 
the other troll C/g because he published his posts after a big 
number of reliable messages provided by the other users. So, 
this situation created a higher value of a conflict. Applying 
the Kmeans algorithm our method provides us a correct 
classification; 

- Trolls= {C/ 4 , C/g} 

- Other users= {C/i, C/ 2 , C/ 3 , C/ 5 , C/e, C/ 7 } 

The users Ui, C /2 and C /3 are not classihed among the 
trolls in spite of their posts that can be categorized as trolls’ 
messages. This result is explained by the fact that they have 
other relevant messages. 
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Fig. 2. The steps of detecting trolls 


VII. Conclusion 

We proposed in this paper a new method for detecting 
’Trolls’ in Q&AC. Relying on this approach managers, can 
control the behavior of the users in many discussion threads 
in order to notify them to stop trolling. Our work is dehned 
within an uncertain framework. It is based on a conflict 
measure in the belief function theory applied between the 
messages of the different users of the thread. First of all, 
this method aims to analyze the messages relative to the 
posted question or topic. But detecting irrelevant message is 
not enough to judge if this user is a troll or not. Thus, not 
only the content of the messages that can characterize the 
trolls but also their behaviors. Next, using the results of this 
analysis we measured the conflict between the different users. 
Finally, after calculating the conflict of each user we applied 
the kmeans method in order to distinguish trolls from the 
other users. Indeed, we have classihed the users according 
to their conhict results into two clusters. This method was 
tested in different simulated data to check its feasibility. Since 
our proposed method for detecting malicious users dealt only 
with one discussion thread, we aim to extend this approach to 
detect trolls inside the community. 
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